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QUANTILE PYRAMIDS FOR BAYESIAN NONPARAMETRICS 

By Nils Lid Hjort and Stephen G. Walker 1 

University of Oslo and University of Kent 

Polya trees fix partitions and use random probabilities in order 
to construct random probability measures. With quantile pyramids 
we instead fix probabilities and use random partitions. For nonpara- 
metric Bayesian inference we use a prior which supports piecewise 
linear quantile functions, based on the need to work with a finite set 
of partitions, yet we show that the limiting version of the prior exists. 
We also discuss and investigate an alternative model based on the so- 
called substitute likelihood. Both approaches factorize in a convenient 
way leading to relatively straightforward analysis via MCMC, since 
analytic summaries of posterior distributions are too complicated. 
We give conditions securing the existence of an absolute continuous 
quantile process, and discuss consistency and approximate normality 
for the sequence of posterior distributions. Illustrations are included. 

1. Introduction and summary. Constructing manageable classes of ran- 
dom probability measures is at the heart of nonparametric Bayesian method- 
ology. Recent surveys of Bayesian nonparametric methods, including de- 
scription of several such classes of random distributions, have been given 
in Walker et al. (1999) and Hjort (2003). The aim of the present article is 
to introduce and investigate one more such class, namely that of quantile 
pyramids. 

One attempt to construct a random probability measure on [0, 1] is via 
so-called Polya trees. This relies on the idea of a fixed binary tree partition 
of [0, 1] and a strategy for allocating random mass to these partitions. The 
original and clearest exposition is provided by Ferguson (1974). More recent 
work on Polya trees has been done by Lavine (1992, 1994). Inference is 
attractively simple since, given an independent and identically distributed 
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set of observations, the posterior is also a Polya tree and the update is 
straightforward. A drawback to Polya trees, and perhaps the main reason 
why they have not seen much application within the Bayesian nonparametric 
literature, is that an arbitrary partition tree of [0, 1] needs to be specified. 
There is no obvious selection criterion, though on [0, 1] the dyadic intervals 
are the natural choice. No partition is "right," however, and two different 
partitions produce two different answers. No satisfactory solution to this 
problem can be anticipated. 

The fundamental idea of the Polya tree is a fixed partition and random 
mass. We turn this around and instead use the idea of fixed mass and random 
partitions. The arbitrariness is now lost as the quantiles form a nonarbitrary 
partition of mass. For a distribution with cumulative function F on 1Z, the 
quantile function is 

(1) Q(y) = F~ 1 (y) = mi{t:F(t)>y} for < y < 1. 

Our program is to construct random probability distributions F via their 
quantile functions Q, using F(x) = sup{y :Q(y) < x}. Specifically, the first 
random partition at Q{\) corresponds to the median and the fixed mass 
of \ is allocated in equal measure to [0,Q(|)) and [Q(±), 1]. The random 
partitions at Q{\) and Q(|) on the second level determine the quartiles 
and the fixed mass of ^ is allocated to the relevant intervals. At stage three 
we draw the octiles Q(f )• In general, at level m, we 

draw quantiles Q(j /2' m ) for j = 1, 3, . . . ,2 m — 1. Even though more general 
probabilistic constructions could be envisaged, we focus on those pyramidal 
schemes where Q(j/2 m ) for j = 1,3, ...,2 m — 1 are drawn independently, 
conditional on the values generated at level m — 1 above, with Q(j /2 m ) E 
[Q((j-l)/2 m ),Q((j + l)/2 m )). 

A mild disadvantage of our quantile trees is that the prior to posterior 
computation is not analytically tractable, or at any rate less so than for 
Polya trees. However, with the recent advent of simulation based inference 
the need for clear-cut conjugacy and analytically tractable posteriors is no 
longer critical. We shall rely on simulation strategies to collect samples from 
the posterior distribution. Therefore, we do not see the lack of analytical 
tractability as a problem and we have removed the need to specify an arbi- 
trary partition. The allocation of the fixed quantile masses to the random 
partitions is the obvious choice, since they are instantly recognizable and 
interpretable. 

While nonparametric priors are typically difficult to manipulate, in the 
sense that the incorporation of real qualitative prior information is nontriv- 
ial, we believe the contrary is true for quantile pyramids. The significance 
of quantiles is well understood and hence assigning a prior to the median, 
quartiles, etc. should be relatively straightforward. There are instances in 
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the literature suggesting that more of statistics, from modeling to analysis 
and interpretation, should be carried out using quantiles; see, for example, 
Parzen (2004, 1979). 

The layout of the paper is as follows. In Section 2 we introduce the quantile 
pyramid process on [0, 1] . In particular we discuss issues of existence and 
continuity. That the pyramid schemes have a large nonparametric support 
is demonstrated in Section 3. In Section 4 we consider, in particular, the 
Beta quantile pyramid. 

In Section 5 we proceed with Bayesian inference associated with quantile 
pyramids. First we use the quantile pyramids to construct a prior on the 
space of piecewise linear quantile functions. We undertake exact posterior 
inference for such priors for any finite level of the pyramid. We also con- 
sider a multinomial type pseudo-likelihood function for the quantiles, and 
investigate the implied pseudo-posterior distribution of the parameters of a 
quantile pyramid. The pseudo-likelihood function in question is a natural 
generalization of a suggestion of Jeffreys (1967), Section 4.4, concerning the 
median parameter, and is sometimes called the substitution likelihood; cf. 
Lavine (1995) and Dunson and Taylor (2005). 

Then in Section 6 we work out the structure of the posterior quantile pyra- 
mids, given a set of independent data points. It is shown that the likelihood 
functions factorize in precisely the same way as the quantile pyramid priors, 
leading to simplifications of the posteriors. We demonstrate how to obtain 
summaries from the posterior quantile pyramid via MCMC algorithms. In 
Sections 7 and 8 results about the large-sample behavior of the posterior dis- 
tributions are obtained; in particular Bernshtem-von Mises type theorems 
are proved under natural conditions. Finally, in Section 9 we provide a brief 
discussion with concluding remarks. 

2. Quantile pyramid processes. This section considers ways of assign- 
ing a probability distribution to the full quantile process, and investigates 
conditions under which it is absolutely continuous. For simplicity of pre- 
sentation we work on the unit interval, and consider therefore processes 
{Q(y) :0 < y < 1} with Q(0) = and Q(l) = 1. Such a Q process is linked 
to a cumulative distribution function F via (1). Note that Q is the left- 
continuous inverse of the right-continuous F, and that Q{y) < x if and only 
if y < F(x). This somewhat nontrivial equivalence is valid also for cases 
where F has jumps; see, for example; Shorack and Wellner (1986), Chapter 
1.1. 

2.1. General pyramid quantile processes. Consider a quantile process 
down to level m, involving random quantiles Q(j/2 rn ) for j = 1, . . . ,2 m — 1. 
We say that Q is a pyramid quantile process down to this level if these 2 m — 1 



4 



N. L. HJORT AND S. G. WALKER 



quantiles have been generated by successive conditionally independent mech- 
anisms down to level m. More specifically, this corresponds to having the 
median Q{\) drawn from some density on [0, 1]; then the two quartiles 
Q(|) for j = 1,3 drawn independently from two densities 7^,1,7^,3 concen- 
trated on respectively [0, Q{\)\ and [Q(h), 1]; then the four remaining octiles 

for j = 1,3,5,7 independent from four level-three distributions tt^j 
confined to the appropriate intervals [0,Q(|)], [Q{\),Q{\)], [Q{\),Q{\)]-, 

1]; and so on. The simultaneous density of the 2 m — 1 quantiles can 
therefore be represented as 




where the parents of Q(j/2 m ) are Q({j ' ± 1) /2 m ), both of whom were created 
in the previous generation. 

2.2. Existence and absolute continuity. We now examine the quantile 
pyramid building process in some more detail, where variables at level m 
are generated after those of level m — 1. At this level, 

(3) Qm(j/2 m ) = Q m -l((j ~ l)/2 m )(l " V mJ ) + Qm-l((j + l)/2 m )V md 

for j = 1, 3, 5, . . . , 2 m — 1, in terms of independent variables V m j's at work 
at level m of the process. Note that variables Vmj at level m are allowed 
to depend on previous generations' V m iji for m' < m — 1. We define Q m on 
the full unit interval by linear interpolation outside the j/2 m points, and 
with Q m (0) = 0, Q m {X) = 1. Under various sets of conditions there will be a 
well-defined process Q to which Q m converges in distribution, in the space 
Dl[0, 1] of left-continuous functions with right-hand limits on the unit inter- 
val, equipped with the Skorohod topology; see Billingsley (1968), Chapter 4, 
for definitions. We shall outline two arguments that can be used to establish 
existence of and convergence to Q. 

The first line of arguments uses martingales. For simplicity of presentation 
assume now that the V m ^s of (3) all have mean ^; more general results follow 
with additional efforts. Then, for each y, Q m {y) forms a martingale sequence 
with respect to the history up to and including the parents, and EQ m (y) = y. 
Hence there is a limit Q(y) to which Q m (y) converges with probability 1. 
Clearly, the limit Q is nondecr easing, with Q(0) = and Q(l) = 1, that is, 
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a random quantile function. We note that Q in some cases might not be 
continuous almost surely. Such martingale arguments are pursued further in 
Propositions 2.2 and 2.3 below. 

The second line of arguments involves tightness and is less immediate, but 
provides more information, in particular continuity, when the criterion we 
develop now applies. The V m j variables of (3) are now allowed to be fully 
general, without the mean ^ constraint. 

Proposition 2.1. Assume that 

A m = max{Q m (j/2 m ) - Q m ((j - l)/2 m )} - p 0. 

j<2 m 

Then there is a well-defined random continuous quantile process Q to which Q 
converges, in the space C[0, 1] of continuous functions on the unit interval, 
equipped with the uniform topology. 

Proof. The crux is that the condition given implies tightness of the 
{Qm} sequence in the C[0, 1] space, as we demonstrate in the next para- 
graph. Given the tightness, Prokhorov's theorem secures the existence of 
a subsequence converging in distribution to a limit process Q, which also 
must be continuous; see Billingsley (1968), Chapter 2. The values of this 
limit process at dyadic points are identical to those of Q m . By denseness of 
dyadic points it follows that also other subsequences must have the same 
limit, proving that Q is the limit process of Q m . 

To prove tightness it suffices by the theory of Billingsley (1968), Chapter 
2, to show that for each positive e and e' , there is a 5 such that 

Pr{o;(Q m , 5) > e} < e' for all large m, 

where u(Q m ,6) is the maximum of all (^-increments Q m (y') — Qm{y) with 
y' — y < 5. Now let 5 = {\) m - For such y and y', find dyadic neighbors with 
i/2 m <y<y'< j/2 m , where j — i is at most 2. Hence, for all m' > m, 

Q m '(y') - Q m >{y) < Q m >{j/2 m ) - Q m '(i/2 rn ) < 2A m , 

using that Q' m is equal to Q m at all j/2 m points. It follows that 

Pr{w(Q m /, (i) m ) >e}< Pr{2A m > e} for all m' > m, 
proving tightness under the A m — > p condition. □ 

We now show that the condition of Proposition 2.1 is very easily fulfilled. 
Assume, for example, that all the V m j variables are independent, and that all 
EV^ j and E(l — Vm,j) 2 are bounded by some c < \ — if, for example, the V m j 
is Beta with parameters (\a m , \a m ), then the condition holds provided only 
that the a m 's stay away from zero. Note that Q m (j/2 m ) — Q m ((j — l)/2 m ) 
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Fig. 1. Random Q(y) curves generated from the same quantile pyramid process, using 
independent V m ,j 's drawn from Beta distributions (|a m , |as m ), with a m = cm? , for c — 2.5. 
This corresponds to the Beta quantile pyramid discussed in Section 4, o,nd produces abso- 
lutely continuous Q(y). 

may be expressed as a generic product V^ 1 ■ ■ ■ , with V\ from the first 
generation, V2 from the second, etc.; and where e% is or 1, writing on this 
occasion V° = 1 — V and V 1 = V. See Figure 1. See also (5) below. This 
leads to 

Pr{A m >s}<J2 MQm.(j/2 m ) - Q m ((j ~ l)/2 m ) > e} 

j<2 m 

< ^ (l/e 2 )E(Vn 2 ---(V^) 2 <(l/e 2 )(2c) m , 

j<2 m 

showing that Proposition 2.1 applies. Similarly, if all V m Ss have their means 
inside [0.293,0.707] and if their variances go to zero, then the A m — > p 
condition holds, implying again a continuous quantile limit process Q{y). 
Examples of such random Q(y) curves are presented in Figure 1. 

The behavior of the Q process depends crucially on aspects of the V m j 
variables. Now we focus on conditions securing smoothness of the Q, and 
for which we must demand more than Proposition 2.1. Consider therefore 
the derivative of Q m at level m, which exists outside the j/2 m points; 

q m (y) = {Qm(j/2 m ) - Q m ((j - l)/2 m )}/{\) m 

on ((j-l)/2 m ,j-/2 m ). 



QUANTILE PYRAMIDS 



7 



We wish to establish conditions under which this quantile density function 
converges to a random function which may be represented as the derivative 
of Q. For illustration, take m = 3, where we may write 



(5) q 3 (y) 



f 8^,1^2,1^3,1 , 
8^1,1^2,1(1-^3.1), 
8^1,1(1-^2,1)^3,3, 

8Fi,i(i-V2,i)(i-y 3 ,3), 

8(1-^,1)^2.3^3,5, 

8(1-^,1)^2.3(1-^3,5), 

8(1 ^2,3)^3,7, 

8(i-yi,i)(i-y 2 ,3)(i- 



v 3 > 



for yG (0,1/8), 
for yG (1/8,2/8), 
for yG (2/8,3/8), 
for yG (3/8,4/8), 
for yG (4/8, 5/8), 
for yG (5/8,6/8), 
for yG (6/8, 7/8), 
for yG (7/8,1). 



We shall see that increased tightness of the V m j's around \ as m grows is 
the key to a well-behaved limit of q m (y). In fact we shall now state and prove 
two results securing existence of an absolutely continuous limiting quantile 
process Q, the first for the symmetric case where the Vmj's have mean ^ 
and the second for the nonsymmetric case. 



Proposition 2.2. Assume that the variables V m j of (3) involved at 
level m are such that E(V m ,j | J-'m-i) = \ and \ai(V m j \ J r m -i) < for 
each j, with J2m=i a rn finite, where T m ~\ represents all previous V m i t j> with 
ml < m — 1. Then with probability 1 there is a function q(y) which is the 
a.e. limit of q m (y) , and for which Q(y) = Jq q(u) du for < y < 1. 

Proof. As seen above, each increment Q m (j/2 m ) - Q m ((j — l)/2' m ) at 
level m may be represented as a product Vf 1 ■ ■ -V^"-, where ej is or 1, 
and V° = 1 — V, V 1 = V. Hence q m {y) may be presented as Z m = W\ • ■ • W m 

with Wj = 2Vp having mean 1. Thus the martingale convergence theorem 

J 3 

applies and leads to the existence of a limit q(y), regardless of the variances. 

The finiteness of J2m=i °~m i s , however, needed in order to secure that 
Q(y) is the integral of q(y). The variance of Z m above is EWf ■ ■ ■ — 1, 
which via conditional expectations and E(W^ | T m -\) < 1 + 4(7^ is seen to 
be bounded by n^=i(l + ^°~r) ~ 1> which again is bounded by the constant 
ex P(4Em=i °~m) ~ 1- l n particular, Jq 1 Var q m (y) dy is bounded as m grows. 
The required statement follows from the corollary of Kraft (1964). The point 
to note is that as far as independence of the V m js is concerned, it is their 
conditional independence given J- m -i which actually matters, and with this 
the theorem and corollary of Kraft (1964) still hold, since Q behaves as a 
distribution function with probability 1. □ 

The above quantile processes with E(V m> j \ F m -i) = \ an have EQ(y) = 
y for y £ (0,1), that is, are centered at the uniform quantile function. In 
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Bayesian practice one needs to be able to center priors at given positions, 
that is, to adjust the prior to match a given prior guess distribution, say 
Qnuii(y) = -^nuii^)- This takes nonsymmetric V m ^s. One needs in fact 

pv = Qnull(i/2 m )-Qnull((j-l)/2 m ) 
^ Qnull((j + l)/2 m ) " Qnull((j " l)/2 m ) 

for j = l,3,...,2 m - 1 

at level m, as is seen from representation (3), in order to achieve ~EQ m (y) = 
Qnuii(y)- If the distribution with Q n uii as quantile function has a density 
with two derivatives, an approximation to the mean above is 

\-\{Q'U{y)/QU{y)}^ m , 

at y = j/2 m . The following proposition demonstrates that such nonsymmet- 
ric setups also give absolutely continuous quantile pyramids, provided the 
variances become small enough. 

We also point out another option for achieving a similar aim, via a simple 
transformation, namely through Q(y) = Q nu ii(Qumf (y)), where Qnmiiv) is 
a quantile process centered at the uniform distribution, using symmetric 
Vmj's. The median of this random Q(y) is equal to Q nu n(y). For example, 
Q(y) = A* + c<i> _ (Qunif (y)) defines a quantile process with median value 
function equal to the quantile function of a normal (/i, a 2 ), with $ denoting 
the cumulative distribution function of a standard normal. 



Proposition 2.3. Assume that the V m j 's are all independent, and write 
EV^jj = \ + 5 m j and the unconditional variance V&rV m j = o'mj- Assume 
further that \S m j\ < 5 m and a m j < a m for all j at level m, where J2m=i °m 
and J2m=i^m are both finite. Then again there is a.s. convergence of q m (y) 
t° l{y)> an d Q{y) ^ the integral of q(y). 

Proof. As in the previous proof we may represent q m (y) as Z m = 
Wi • • -Wm, with Wk = 2Vu k : , again writing generically V 1 = V and V° = 
1 — V. Existence of a limit for Z m is not as automatic as in the pre- 
vious symmetric case, since its mean differs from 1 and the martingale 
convergence theorem cannot be directly applied. Consider, however, = 
W 1 ---W m /(£i---£ m ), where m is the mean of Z m . Then the martin- 

gale theorem applies to Z* t , which therefore has a well-defined limit Z* . 
But the sequence of products of means £i ■ • ■ £ m is seen to converge, basi- 
cally since the conditions imposed imply that II^=m+i 6- must converge to 
1 when m and m! > m go to infinity. Next, 

EZl < f[{(l + 25 r f + 4^} = fl (1 + 25 r f ft (l + 7-^}. 

r=l r=l r=l ^ 1 ' ' 
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Using the 1 + x < exp(x) inequality and some further analysis, one sees that 
the sequence of variances is also bounded, like the sequence of means. That 
Q(y) is the integral of the limiting quantile density function q(y) under the 
boundedness of Varq m (y) condition follows as for the previous proposition, 
again via techniques from the proof of Kraft's (1964) theorem. □ 

3. Large pyramidal support. Assume that sufficient conditions are in 
place for Q, and hence also F = to be absolutely continuous with 

respect to the Lebesgue measure on [0, 1]; cf. the proposition above. Assume 
the same to be true for Qq, which we shall refer to as the true quantile 
function. Then Qq admits the density /o on [0, 1] with corresponding quantile 
function Qo{y) = F Q ~ 1 (y) and quantile density qo(y) = l//o(Qo(y))- Now let 
IT be the probability measure governing the q = ]imq m , and consider the 
following conditions: 

(A) For all e > 0, U{q : Jqlog(q/q ) du < e} > 0. 

(B) For all 5 > there exists an e > such that 



for any T e (u) for which max u \r e {u) — u\ < e and t £ (u) S [0, 1]. 
(C) The density /o is bounded by some K < oo. 

Proposition 3.1. When conditions (A)-(C) hold, each Kullback-Leibler 
neighborhood {/: J /olog(/o//) dx < e} around the fixed /o has positive 
U-probability. 

Proof. We first show that condition (B) implies condition (B0), which 
is that for all 5 > 0, there exists an e > such that / {qo(u) / qo(T £ (u))} du > 
1 — 5 for any t £ (u) for which max u |r e (n) — u\ < e. For any 6 > there exists 
an e > such that 



Clearly, for any 6 > there exists a 9 > such that exp(— 0) > 1 — 5 and so 
the claim is proven. 





Now for any positive random variable Z, logE^ > ElogZ and so 




and hence 
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By using the transform x = Qo(u), the Kullback-Leibler distance / /olog(/o/ 
/) dx may be expressed as 



/ 



q(r(u)) , ( q( T ( u )) , [, Qo(t(u)) , 
log ^ v du= / log H \ \ Y du+ / log ^Y {' du, 
q {u) J qo{r{u)) J q {u) 



where r(u) = F{Qq(u)). We will now deal with these two terms separately. 
For the first term, use the transform x = t{u) to give 



q(x)log{q(x)/q (x)}f (Q(x))dx. 
The aim is to show that for any 5 > 0, the prior puts positive mass on 

(x)log{q(x)/q (x)}f (Q(x))dx < K J q(x) log{q(x) / q (x)} dx + 5 
and hence, using condition (A), the prior puts positive mass on 

l(x)log{q(x)/q (x)}f (Q(x))dx < 8 
for any 8 > 0. Now 

J q(x)log{q(x)/q (x)}{K - f (Q(x))}dx> J f (Q(x))q (x) dx - 1 

and 

f (Q(x))q (x) dx = f q ° ^ dx, 



<?o(A(») 

where A(x) = Fq(Q(x)). Condition (A) is sufficient for the prior to put pos- 
itive mass on {Q : max„ \ Q(u) — Qo(u)\ < 9} for any 9 > and so from the 
absolute continuity of Fq, for any e > the prior puts positive mass on 
{Q : max M \X(u) —u\< e}. Condition (BO) finishes the story for the first term. 

For the second term, again, condition (A) is sufficient for the prior to put 
positive mass on {Q : max n \Q{u) — Qo{u)\ < 9} for any 9 > 0. Thus, from 
the absolute continuity of F, for any e > the prior puts positive mass on 
{F : max u \t(u) — u\ < e}. Hence, using condition (B), for any 8 > there 
exists an e > such that the prior puts positive mass on 

<?o(t £ (u)) 



f log S^l du<6 . 

J Qo{u) 



qo(u) 

This completes the proof. □ 

Here we establish that condition (A) holds in all situations where the 
VmjS have full support on [0, 1], expectations fixed at \, and with variances 
decreasing sufficiently fast. It will be clear from the arguments used that 
condition (A) continues to be in force also when the expectations deviate 
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slightly from i, within the limits dictated by Proposition 2.3. The sufficiently 
fast variances in question are recorded in Barron, Schervish and Wasserman 
(1999) and amount to 

oo oo 

E 



^ max(Var V m ,j) X ^ 2 < E ff m < oo. 



3 

m=l m=l 

With the conditional expectations of the Vmj's fixed at \ it is noted that 

EVar(y mj - | T m -\) = Var V m j 
and so conditioning on previous V m /j is permitted provided 

oo 

max{EVar(y mj - | T m ^i)} l/2 < oo, 

m=l 3 

which is the sum of the unconditional variances, since E(V m j) = ^, in the 
sense that the argument of Barron, Schervish and Wasserman (1999) then 
continues to go through. Following Lavine (1994), we can write D(Q,Qq) = 
J Q log ((//go) dy as the difference of two sums, the first being 

(?) EE E ^) lQ g w#? 

m e je{0 ,l} I He) 

where A represents the Lebesgue measure. The A can also be replaced with 
any smooth Q nu n , as long as it remains a dominating measure for the Q pro- 
cess; cf. the previous section. Here B £ is a dyadic interval and for a particular 
m, we have e = [e\, . . . ,e m ) where G {0, 1} and (e,j) = (e\, . . . ,£ m ,j) for 
j G {0, 1}. So, for example, B , = [0, 1/4) and ,60,1,1 = [3/8, 4/8). Now make 
the variances of the V £ decrease sufficiently rapidly, ensuring that 

f, V e , 1-V e I 
> max< log — ; 1 V loe — ■ > 

^ e \ ^\{B £ ,\B £ ) X(B £ j j B e ) J 

converges with positive probability and hence that (7) converges. The second 
term is / qlogq^dy which is finite if (7) is, and which can also be expressed 
as a sum over m, 

EE E We^flf 

m e je{o,i} J 
The proof is then completed using the fact that the V's have full support 
on [0,1]. 

We note that the result about condition (A) being satisfied is not surpris- 
ing in view of the fact that the Q is similar enough in structure to a Polya 
tree in order for Theorem 2 of Lavine (1994) to apply. 

Condition (B) is a property of qo and is a quite mild smoothness condition. 
If qo is continuous on (0,1), then we only need consider the integral in a 
neighborhood of and 1. 
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4. The Beta and the median-Dirichlet quantile pyramids. We shall dis- 
cuss two attractive options for the class of distributions V m j of (3) that 
make up the core of a quantile pyramid. 

4.1. The Beta quantile pyramid. The first option is to use independent 
Betas for the weights appearing in (3). An appealing choice is to let all V m j's 
in (3) be symmetric Beta variables, with parameters, say, (\a m , \a m ) for 
those at level m. These have variance \/{a m + 1). As long as J2m=i V( a m + 
1) is finite, the limiting quantile process Q of the Q m 's has a.s. a quantile den- 

1/2 

sity function q(y). The slightly stronger condition J2m=i V°™ <oo secures 
condition (A) of Section 3. Note that the Q is constructed as a Polya tree, 
but importantly its accompanying distribution function F = Q _1 is not. In- 
terestingly, the very same condition about the Beta distribution parameters, 

1/2 

about J2m=i V a m being finite, occurs in Ghosal, Ghosh and Ramamoorthi 
(1999b), where it is seen to imply posterior consistency of symmetrized Polya 
trees. 

The uncertainty of q(y) around its constant mean 1 is dictated by the 
variances of the V^j's, sometimes in complicated ways. Intriguingly, when 
all Vmj's inside the same generation m have the same distribution, sym- 
metric around \ , we may actually find and assess the distribution of B m = 
max,, q m (y) and its limit B = maxg(y) explicitly. This is due to the symme- 
try of the representation 2V^ X ■ ■ ■ 2V m m over different intervals, as displayed, 
for example, in (5). At each node, either V m j or 1 — V m j is in (^,1), the 
other in (0, ^). The maximum value of q m (y) takes place in that interval for 
which each of the m components Vp are in (|, 1). Hence 

m m 

B m =d I] max{2^-,2(l - Vj)} = d \{{2U 3 ) 

3=1 3=1 

in terms of generic Vi,V2,... from generations 1,2,..., where Uj is dis- 
tributed like Vj conditional on Vj > |. This distribution converges, under the 
finite sum of variances condition, and is easily simulated for given regimes 
for the distributions of Vj's. For the Beta quantile pyramid, EU m = £(^a m ), 
say, where £(&) is the mean of V \ {V > ^} when V is a symmetric Beta(6, b). 
Some efforts and integration skills lead to 

m ~Ji/2T(b) 2 [ ] r(6FWU + r(6 + i/2)r 

Hence EB m = n ™i{2£(±a J )}. 

To assess this usefully, we note that 2(26+ 1) 1 / 2 {V — |) tends to a standard 
normal, when V ~ Beta(6, b) and b goes to infinity. This means that U = 
max(V, 1 — V) behaves like | + ^Z/(2b + l) 1 / 2 for large b, where Z is a 
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standard normal conditional on being positive. This leads to 2£(6) = 1 + 
(2/7r) 1 ' /2 /(26 + l) 1 / 2 for increasing b. We learn from this that B = maxq(y) 

has finite mean when J2rn=i ^/ a m 2 is finite. Also, if, for example, a m = cm 3 , 
then the mean of B may be studied as a function of c, which is useful when 
attempting to elicit a prior process for one's quantile function. One may 
similarly study the distribution and expected value of min q(y) =d Yl'jLi 2(1 — 
Uj). 

As discussed around (6), we would often wish to center quantile processes 
at given null distributions, which would need nonsymmetric Beta variables 
in the above construction, say, employing Beta(^a m , \b m ) with appropriate 
o-m,b m - Proposition 2.3 dictates that a m and b m need to become close to 
each other for growing m, in order for a limiting quantile density function 
q(y) to exist. 

One special case worth mention is that where all the V m ^s are uniform, 
corresponding to all a m = 2, where the quantile process amounts to a natural 
splitting procedure: (i) the median Q{\) is uniform on [0,1]; (ii) the two 
extra quartiles are independent and uniform on [0, Q{\)\ and [Q(|), 1]; (hi) 
the three extra octiles are independent and uniform on the four intervals 
defined by the three quartiles; and so on. This might be seen as a natural 
noninformative prior scheme. More generally one might study the case of 
a m = a constant, with the same Beta(^a, ^a) at work at all levels for the 
V m j. Then the Q(y) process is a.s. continuous but singular, not equal to 
the integral of its derivative. This follows from results of Ferguson (1974), 
page 621; see also Dubins and Freedman (1967), based on the fact that Q 
behaves as a distribution function with probability 1. 

The Vra^s of the Beta quantile pyramid might employ parameters (|a m , ^b m ) 
that depend on the previous outcomes of V m ij for m! < m, for example, in 
a Markovian fashion. This gives one the opportunity to modify the behavior 
of Q m in light of aspects of Q m -i- 

4.2. The median- Dirichlet quantile pyramid. Agree to say that a random 
variable U has a median-Dirichlet distribution with parameter a, written 
f/~MD(a), if 

(8) Pr{U <x} = H a (x) = Pr{Beta(aa;, a(l - x)) > \] = G(|;a(l - x), ax). 

Here G(-;a,b) denotes the cumulative Beta distribution with parameters 
(a, b). To motivate this definition, suppose that F is a Dirichlet process with 
parameter aF un [, where F nn [ is the uniform distribution on the unit interval. 
Then its random median U = Q{\) = inf{t : F(t) > |} does in fact have this 
MD(a) distribution. Note that U is symmetric around its center value ^. 
More generally, when -F n uii is any probability distribution on the line, say 
that U ~ MD(oF nu n) if Pr{£7 < x} = H a (F mU (x)). This is the distribution 
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of a random median from a Dir(ai ? nu n) process; see also Hjort and Petrone 
(2006). Our emphasis in this connection is on using the MD(o) distribution 
as a modeling tool when working with quantile pyramids. 

From Proposition 2.2 we know that the degree of continuity of the limit- 
ing quantile process is governed by the sizes of the variances of the V m Js. 
For the present case this necessitates studying the variances of the MD(a) 
distribution (8). This may be written 

t 2 {o)= Vx a {U 2 >x}dx-{\) 2 = / G(±;ax 1/2 ,a(l-x 1/2 ))dx-l/4. 
Jo Jo 

Inspection and some analysis reveal that r 2 (a) starts with value 1/12 for 
a at zero and then goes down at rate 0(1/ a) when a grows. Intriguingly, 
T 2 (a) = (1 / '4) p(a) I (a + 1) where p(a) goes monotonically from 1/3 up to 1 
as a grows, making the MD(o) distribution quite similar to the Beta(^a, ^a) 
for growing a. Hence remarks made earlier for the Beta quantile pyramids 
have clear analogues for the median-Dirichlet governed quantile pyramids; 
convergence of J2m=i l/( a m + 1) secures absolute continuity, for example. It 
may also be attractive to determine the concentration parameter a at level 
m by taking into account the results realized at level m — 1 . One such option 
is V m ,j | T m ~\ ~ MD(a m j) with a mj - = b m /A(m — in terms of A(m — 
l,j)=Q m _ 1 ((j + l)/2 m )'-Q m _i((i-l)/2 m ).Finiteness of £m=l l/(l + b m ) 
secures absolute continuity of the resulting quantile pyramid. 

5. Exact posterior and pseudo-posterior pyramids. Let X±, . . . ,X n be 

independent observations from a continuous distribution F on [0,1]. We 
shall discuss ways of obtaining the posterior distribution of the quantile 
process. 

One point of view is that Q defines the cumulative distribution function 
F, after which aspects of the posterior distribution of F may in principle be 
derived via the defining characteristics 

Pr{F eC,X 1 eA 1 ,...,X n eA n } = EI{F e C}F(A0 • • • F(A n ), 

valid for all Borel subsets C of the space of cumulative distribution functions 
and for all intervals A\, . . . , A n . Then aspects of Q given data may be derived 
using (1). For example, considering a single Q(y), 

Pr{Q(y) < x, X{ £ Xi ± e for each i} 

n 

= EI{Q(y) < x} Jilt?" V* + e) " Q'H^i ~ e)}, 

i=l 

which in principle should lead to the posterior distribution of Q(y). This 
would often be a cumbersome route to follow, however, which is why we 
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circumvent the F here, attempting instead to work directly with the Q 
process. 

First a comment on how we do this is in order. We obtain the exact 
posterior for the prior II m . Such a prior generates random quantile functions 
by linear interpolation. We work with H m in the same way that posterior 
Polya trees are obtained for partitions constructed down to a finite level. 
What is important is that the prior is defined and exists for all m and as 
m — > oo converges to a well-defined prior II. 

5.1. Exact posterior inference for Tl m . The prior, as has been mentioned 
generates quantile functions based on linear interpolation between random 
points. Then the inverse of Q m , say F m , is linear on each quantile interval 
[qj-i,Qj], with a constant derivative there; 

(9) f m {x) = F' m {x) = for x £ (qj-i,qj), 

k qj - q.j-\ 

for j = 1, . . . , k = 2 m . Here qo = and = 1. This amounts to a "random 
histogram" type model, with random cell widths but fixed probabilities over 
these cells. 

There is also another route to the (9) density, as follows. In general, 
for a smooth distribution F with density / and quantile function Q, the 
quantile density function is q(y) = Q'(y) = 1/ f(Q(y)). Inverting this gives 
f(x) = l/q{F{x)). In the present context this leads naturally to the level m 
prior which generates random densities of the type 

fm(x) = l/q m (F m (x)), 

where F m (x) for given x is the solution y to the equation Q m (y) = x. But 
this can be seen to be exactly the same as (9), due to expression (4) for q m 
and the linear interpolation character of Q m and F m . 

Under this linear interpolation prior there is a well-defined likelihood 

•1 1 \NM 

(10) L n {q) = Y{{ 

j= 

where Nj(q) = nF n (qj_i,qj] is the number of points falling inside the jth 
quantile interval (and with F n being the empirical distribution of the data). 
Its behavior for growing n is dictated by 

k 

-n" 1 log L n (q) =^2n~ 1 N j (q)log(q j - qj-x) +logfc 
3=1 

1j ~ Qj-i 
1/jfe ' 



\\kqj-qj- 



>p HQ) = J2 F o(<?j-i, Qj] log ■ 



=1 
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For fixed prior and growing n, the posterior distribution of q = (qi, . . . , qk-i) 
will concentrate on decreasing neighborhoods around q° = (q^,. . . , q^_i), de- 
fined as the minimizer of X(q). 

5.2. The multinomial substitute likelihood. In the setting above, assume 
that a pyramid-type probability distribution is given for the k — 1 quantiles 
qi, . . . , qk-i, where k = 2 m , but we avoid any further specification of Q. We 
define the pseudo, or substitute, likelihood for the data as the multinomial 
probability 

n \ fl\ Nl(q) fl\ Nkiq) 

N^q),...,^^)) [k) '"UJ 



L n (q) 

(11) 



N 1 {q)\---N k {q)\\k / 

Such a construction can be found in Jeffreys (1967), Chapter 4, for the 
particular case of the median, that is, for k = 2, who noted that it would 
yield a "valid uncertainty." This has been further discussed by Kalbfleisch 
(1978) and by Monahan and Boos (1992), who pointed out that L n (q) is not 
the conditional distribution of the data given any statistic, and by Lavine 
(1995), who showed that in any case using this substitute likelihood produces 
asymptotically conservative inference. The following arguments and results 
provide more general insight into aspects discussed in the above references, 
and specifically lend support to Jeffreys's claim; further discussion is offered 
in Section 8. 

We start with Stirling's formula and find 

logr(nj> + 1) = (np + ^)(logn + logp) — np 

+ log(27r) 1 / 2 + (l/12)(np)~ 1 + 0((n P y 2 ) 
for growing np, from which we derive 

Pj logPj + ~~ / + logk + ~2^~ log 2^ + Rn ^' 

where pj = Nj(q)/n = F n (qj-i,qj] is the relative proportion of points falling 
inside the jth quantile interval and 

1 1 f k 1 1 

R n {q) = ol / — — 1 > + smaller terms 

l2n 2 {^ lPj J 

goes to zero in probability. If the data points are generated from some Fq, 
so that F n — > Fq uniformly, with probability 1, then 

^ F ( '1 

-n _1 logL n (g) -» P A(g)=^F (^-i ) q ? -]log - 3 ~l" 3 ■ 
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This is the Kullback-Leibler distance from the discrete probability distribu- 
tion with point masses F(qj-i,qj] for j = 1, . . . , k to the uniform one with 
point masses 1/k. This lends credibility to (11) as being appropriate for the 
nonparametric framework, since maximizing L n (q) for large n amounts to 
minimizing X(q), which happens exactly for i*o(?j— = 1/k for each j, 

that is, F (qj) =j/k. Also, since J2j=i u j log(«iA _1 ) = |^Ej=i( M j - 1A) 2 
when all the Uj's are close to 1/k, L n (q) is approximately proportional to 

k 



(12) L* n (q)=e W 



lnky2{Fn(qj-i,qj}-l/k}' 



2 ^ i 

3=1 



(pi"-Pfc) 1/2 . 



6. Updating. Consider a pyramid quantile process of the general type 
described in Section 2, interpreted as a prior process for an unknown quantile 
function Q m (y) in a nonparametric Bayesian setup. This section describes 
how we may update Q m after having observed a sample x\, . . . ,x n . Following 
Section 5 there is the exact likelihood and the pseudo-likelihood; two related 
but different ways of handling the updating. We show that for both versions, 
the pyramid structure is retained, leading to certain simplifications for the 
posterior and pseudo-posterior quantile distributions. 

6.1. Updating the linear interpolation prior. Consider a quantile process 
described down to level m, involving a Q m defined in terms of the qj = 
Q(j/k) quantiles for j = 1, . . . , k — 1, where k = 2 m . There is a prior H m (q) 
for q = (qi, ■ ■ ■ ,qk-i) of the type (2). With the likelihood (10), the exact 
posterior is given by 

(13) II m (g | data) oc U m (q)L n (q). 

We now demonstrate that the likelihood factorizes in pyramidal fashion. 

The basic step involves the following quantity. Let M n {a,b) = nF n (a,b] 
count the number of data points having fallen inside (a, b], and study 

I l \M„(a,q) /-[ J \M n (q,b) 

R n (q;a,b) 



2(q-a)/(b-a)J \2 (b - q)/(b - a) 

for q G (a, b), 

where M n (a,b) counts the number of data points falling in (a, b\. To exem- 
plify, we find for m = 2 that 

£n(<7i,<?2,<?3) = K n (<2 2 ;0, l)R n (q 1 ;0,q 2 )R n (q3;q2A)- 
Similarly, for m = 3, 

3 — A" 

x n ^MiyM^W^ 1 

i=l,3,5,7 V V ' \ ° / 
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The general formula involving R n (Q(j /2 m ); parents) follows and becomes 

^(qQ);°' 1 )^(qQ);°'QQ))^(q(1); £ ?Q)' 1 



n -«f><^> 



j=l,3,5,7 



II ^\Q\2m)> P arents 



X • • 

j=l,3,5,...,2™-l 

Verifying that this is identical to L n (q) of (10), with qj = Q(j/2 m ), is a 
matter of algebra and book-keeping. This leads to an expression for the 
posterior distribution: 

7ri ' i ( Q G)) sn ( g G) ;0,1 

(14) x K2,xyQy^j parents J R n yQ ^ J ; parents 

x II ^^(^(i) P arents )^nfQf0;parentsV--. 

j= 1,3,5,7 

This part provides details of the Metropolis-Hastings algorithm for the 
linear interpolation process. The posterior density for q = (<7i, • • • i) is 
given by 

U rn (q | data) (xU m (q) IT where k = 2 m . 

f^Kqj-qj-lJ 

A Metropolis-Hastings algorithm proceeds by taking a proposal q' for q, 
which we do by changing one component at a time; that is, we take q'j 
uniform on (qj-i, qj+i) and q[ = qi for I / j. Consequently, the accept-reject 
ratio for the algorithm is 

( qj - q 3 ^) N M {q - q ) N j+1 ( q)lLm{ l) 

mm< 1 



(q'j ~ q'^r^Wj+l ~ q'^^UM . 

This is in principle a straightforward algorithm to implement. 

There is of course a broadly flexible class of priors to use when it comes 
to the choice of H m (q), via (2). For illustration, take all the V^j's of rep- 
resentation (3) to be independent, with the V m i^s at level rn! coming from 
the same density g m i, as with the Beta quantile pyramids. Then, at level 
m = 5, li^iqi, . . . , §3i) may be written 

giwn^ q3 ~ qj ~ 8 ^ — - 

jes 2 
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x n 93 

(15) 

I 94 



gj-gj-4 \ 1 



9j+4 - Qj-i/ - <?j-4 
Qj-qj-2 \ 1 



jeS 5 



Qj-Qj-i \ 1 



Vj+i -Qj-is Qj+i-Qj-i 



on the set where < q\ < ■ ■ ■ < 931 < 1, in which S 2 = {8, 24}, S 3 = {4, 12, 20, 
28}, S" 4 = {2, 6, 10, 14, 18, 22, 26, 30}, and 5 5 = {1, 3, 5, . . . , 31}. 

6.2. Updating with the multinomial substitute likelihood. For this ap- 
proach we use L n (q) instead of L n (q) in (13), and consider 

(16) „„(„;«,(,) = ( Mn( ^\ qib) ) for ,€ („,»). 

This is also the symmetric binomial probability that M n {a,q) of the points, 
among the M n (a,b), will fall in the (a,q\ interval. Note that M n (a,q) and 
M n (q,b) depend in a somewhat cumbersome form on the q argument. We 
shall see, via algebraic manipulations of the multinomial likelihood (11), that 
it also factories into various contributions, of the type (16). To exemplify, 
for the case m = 2 one finds that £n(9i> 92,93) equals 



Similarly, for m = 3, we find 

L n (q) = ^(94; 0, 1)^(92; o,9 4 )ft n (9 6 ; 94,1) n K n(qj-,qj-i,qj+i) 

7=1,3,5,7 

in terms of the octiles vector (91,- -.,97), and so on. The general formula 
follows as for the previous case and verification is again a matter of alge- 
bra and book-keeping. This leads to an expression for the pseudo-posterior 
distribution of the same structure as (14). 

The best way of sampling from the pseudo-posterior distribution of the 
vector (91, ■ • ■ ,9fc-i) appears to be via a Metropolis-Hastings type algo- 
rithm, as follows. A proposal for q is taken to be q' given by q'j uniform on 
(q'j-ii q'j+i) with q[ = qi for I / j . For an iteration, we sweep through all the 
j's in turn. Consequently, the accept-reject ratio for the algorithm is given 
by 

' N j (q)\N j+1 (q)m m {q t ) 

For general discussion of aspects of the Metropolis-Hastings type algorithms, 
see, for example, Tierney (1994). 



mm 
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6.3. Illustrations. A number of simulations were undertaken. Firstly, for 
a moderate sample size of n = 100 and with m = 32, the true quantile func- 
tion, from which the data were simulated, was taken to be Q(y) = y 2 • The 
Gibbs sampler was run for 5000 iterations and all the samples were used in 
constructing the Bayes estimate of Q(y). Figure 2 is the Bayes estimate of 
Q using the substitute likelihood. The bold line denotes the estimate and 
the dotted line the true quantile function. Figure 3 corresponds to the Bayes 
estimate based on the linear interpolation process. Again, the bold line is 
the estimate and the dotted line is the true quantile function. The prior used 
in both cases is the uniform for the quantile interpolators Vmj's, that is, as 
in (15) with the uniform for gi,g2,93,94,95- Note that the accompanying 
joint density for ... ,931), for this "uniform stick-breaking prior," is not 
flat in (/-space. 

7. Bayesian consistency. In this section we provide results related to 
Bayesian consistency and asymptotic proximity of the approaches/models 
used in Sections 5 and 6. We go further in Section 8, reaching large-sample 
approximation results of the Bernshtein-von Mises theorem variety. 

Subject to regularity conditions on /o [sec conditions (B) and (C) in 
Section 3], the prior can be arranged so that it puts positive mass on all 
Kullback-Leibler neighborhoods of /o; see Proposition 3.1. With this it is 
well known that the posterior distributions accumulate in weak neighbor- 
hoods of f . That is, U n (A) = U(A \ X n ) -> 1 with / -probability 1 where 
X n = (Xi, . . . , X n ) and A is any weak neighborhood of /q. 




0.0 0.2 0.4 0.6 0.8 1.0 



V 



Fig. 2. With n — 100 data points simulated from the distribution with quantile function 
Qo(y) =y 2 ' , the figure displays the Bayes estimate of Q using the substitute likelihood. The 
bold line denotes the estimate and the dotted line the true quantile function. 
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0.0 0.2 0.4 0.6 0.8 1.0 

y 

Fig. 3. With n — 100 data points simulated from the distribution with quantile function 
Qo{y) = y 2 , the figure displays the Bayes estimate of Q using the linear interpolation 
likelihood. The bold line denotes the estimate and the dotted line the true quantile function. 

Nevertheless, inference will be based on II m which will typically be sample 
size dependent; the more samples the larger m will be taken. So we will 
undertake consistency issues assuming that m n increases as n increases. 
This is what an experimenter would do and so we establish consistency for 
such a procedure. So, consider now the prior II* = H m „ which generates 
f m (x) defined in (9), with resolution level m = m n , now allowed to increase 
slowly with sample size, with consequent k = k n = 2 m " cells. 

Proposition 7.1. Let independent observations Xi,X2, ■ ■ ■ be generated 
from a density fo, that is, inside the Kullback-Leibler support ofH, the limit 
of quantile pyramids II m , assuming conditions (A)-(C) of Section 3 hold. 
Let furthermore II* = n m?i be the quantile prior stemming from construction 
(9) for f m , with 

(17) k n — ► oo and k n /n — > 0. 

Then the sequence of posterior distributions is Hellinger consistent at /o- 

Proof. For a finite m the number of Hellinger balls required to fill up 
the space of densities generated by II* is finite. Call this number by N n . 
In order to achieve consistency with this sample size dependent prior, along 
with the support condition, as in Section 3, we require that 

oo 

exp(— nc)N n < oo for each positive c. 

n=l 
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This result can be found in Walker (2003) based on previous findings from 
Ghosal, Ghosh and Ramamoorthi (1999a). The c is related to the size of the 
balls and hence the need for it to be arbitrarily small. Let us fix the size of the 
balls to be 5 > 0. An observation is that if \qj — g|| < £ for all j = 1, . . . , k n — 1, 
then the Hellinger distance between the corresponding densities / and /* 
will be bounded by 5 for small enough e. Now split the unit interval into 
K = [1/e] equal parts all of size e. Then clearly 

N n {e)<K kn with k n = 2 mn . 

So we require 

oo 

exp{— n(c — n~ 1 k n logK)} < oo 

n=l 

for all c> and K which happens precisely under the (17) condition. □ 

A refinement is possible here by allowing the size of the balls to also 
depend on n. 

8. Approximations to the pseudo-posterior distribution. Here we exam- 
ine various natural approximations to the pseudo-posterior quantile pyra- 
mids, and reach so-called Bernshtem-von Mises theorems under natural 
conditions. 

For parametric models a classic large-sample result about the maximum 
likelihood estimator 6 for the parameter 6 is that y/n(6 — 6q) tends to 
N(0, J(8q)~ 1 ), with J(#o) the information matrix at the true parameter 
value #o- A Bayesian mirror result to this is that under mild conditions on 
the model and the prior used for 9, the posterior distribution of y/n(6 — 8) 
will a.s. have the same limit distribution. This also implies that the Bayes 
estimator E(# | data) and the maximum likelihood estimator become \fn- 
equivalent for large n. Such results can be traced back to Bernshtem (1917) 
and von Mises (1931) and are often collectively referred to as Bernshtem-von 
Mises theorems. The importance of these results is partly that an easy-to- 
use approximation can be used in applied Bayesian statistics, in cases where 
the precise prior-to-posterior calculations are complicated, but lies also in 
revealing that data appropriately wash out the prior as the data information 
level increases; different priors will lead to approximately the same inference, 
and this inference will also agree to the first order of magnitude with that 
of classic frequentist approaches. In Bayesian nonparametrics such results 
are not to be taken for granted [see, e.g., Freedman (1999) and Hjort (2003) 
for counterexamples], and are also typically harder to prove if they hold at 
all; see, for example, Ghosal (2000), about exponential families with a grow- 
ing number of parameters, and Kim and Lee (2004) and De Blasi and Hjort 
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(2007), concerned with semiparametric event history models with Beta pro- 
cess type priors. 

Let again /q be the true density underlying independent data Xi, . . . , X n 
on [0,1], with cumulative and quantile distribution functions Fq and Qq. 
Two results from classic empirical process theory are that 

A n (t) = V^{F n (t) - F (t)} -+ d W°(F (t)), 

B n {y) = V^{F~Hy) - Qo(y)} - d qo(y)W°( y ) = W°( y )/f (Q (y)), 

where W° is a Brownian bridge, that is, a zero-mean normal process with 
covariance function 2/1(1 — 2/2) f° r Vi < 2/2- The first convergence takes place in 
the space Dr[0, 1] of right-continuous functions with left-hand limits on [0, 1] 
while the second holds in each space Dl[£, 1 — e] of left-continuous functions 
with right-hand limits on [e, 1 — e], both equipped with suitable versions 
of the Skorokhod topology. For these results see, for example, Shorack and 
Wellner (1986), Chapter 3. 

We shall first focus on quantiles q= (gi, . . . , qk-i) for a fixed number 
k = 2 m of cells, with qj = Q(j/k). For this situation the above result for B n 
implies for the frequentist estimator q* = F~ l (j /k) that 

Mq] - «°) -d qoU/k)W°(J/k) for j = 1, . . . ,k - 1, 

with q® = Qo(j/k) being the real underlying quantile. Our next result pro- 
vides a Bernshtem-von Mises mirror result to this. 

Proposition 8.1. Consider any quantile pyramid prior H m (q) for the 
quantiles q = (q±, . . . , qk-i), with the number of cells k = 2 m being fixed, and 
let the pseudo-posterior distribution of q be defined in terms of the multino- 
mial likelihood L n (q) of (11). Then with probability 1 the pseudo-posterior 
distribution of q is such that the vector with components 

C n ,j = Vn~(q 3 ~ q*) = V^{QU/k) - F~\j/k)} 

converges to that of Cj = W°(j /k)/ fo(qj) , for j = 1, . . . ,k — 1. 

PROOF. Write qj = q * + jj/^/n for j = 1, . . . , k — 1. Then 
F n (q j )=F (q j )+A n (q j )/^ 

= F (q*) + {fo(q*) + A n (q*)}/^ + o^n" 1 / 2 ), 
which implies that \/n{F n (qj) — j/k} can be written 

V^{F (q*) - j/k} + / (g*) 7i + A n (q*) + o p (l) = f (q*hj + o p (l). 
Consequently, with probability 1, 

Vn{F n (qj-i,qj) - l/k} = fofajhj - fo(qj-i)lj-i + o p {l) 
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since q*A — ► q® a.s. for j = 1, . . . , fc. Here we write 70 = and 71 = 0. From the 
likelihood approximation (12), the pseudo-posterior density of (71, . . . , 7fc-i) 
is proportional to IL m (q* + ^ / y/n)L n (q* + ^ / y/n) , where, up to further factors 
that vanish in importance, 



L n (q* +7/V™) = exp 

(18) 



k 



= exp(-i<^V), 

say; the underlying convergence is uniform over all balls ||7|| < c. Here <f> is 
the (k — l)-vector with components <pj = /o(?j)7j — /o(?j-i)7j-i ; and 

/ A;~ 1 (l - A;" 1 ) ••• -k~ 2 \ (2k ■■■ k 

S fc = • • • with S^ 1 = 

V -A;" 2 ••• /c~ 1 (l - A;" 1 ) / \^ •■• 2A; y 

This implies the statement of the proposition, in view of the multinomial 
structure of the covariances of a Brownian bridge. □ 

The above implies that the pseudo-posterior quantile process 

(19) c n ( y ) = V^{Q(y) - Q*(y)} = V^{Q(y) - F~\y)} 

is such that the pseudo-posterior distribution of C n (y) tends to C(y) = 
Qo(y) x W°(y) for y at positions l/k,2/k, ... ,(k— l)/k, as long as the reso- 
lution level is fixed with k = 2 m cells, for any pyramid prior. It also follows, 
by taking expectations, that the Bayes estimator E{Q(y) \ data} becomes 
equivalent to the frequentist estimator F~ 1 (y) for large n, under these con- 
ditions, for the y = j/k positions. 

Remark. We do anticipate that there is a stronger Bernshtem-von 
Mises theorem that under conditions somewhat stronger than those of Propo- 
sition 7.1 will imply that the full process C n of (19) will converge to C = 
qo(-)W°(-), inside each of the Skorokhod spaces Dl[e, 1 — e]. In a technical 
report version of the present article we have provided details of arguments 
that combine to formulate the conjecture that for the Beta quantile pyramid 
with V m ys taken as Beta(^a m , \a m ), and if the density /q is bounded on 
[0,1], then conditions 

k n ^oo, k n /y/n^0, mam/y^n^O 

secure C n — >d C in the described sense. 

9. Discussion and concluding remarks. We end our article with a list of 
concluding comments, pertaining to various aspects of our quantile pyramid 
processes. 
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9.1. What is the F of a quantile pyramid Q like? It is in general not 
possible to understand the distribution function F generated from Q in 
terms of, for example, analytic expressions for means or variances. It is best 
understood in terms of distributing mass to random partitions and relying 
on existence theorems, as discussed in Section 2, in analogy to what is done 
if F had been generated by a Polya tree. 

Doss and Gill (1992) provided a machinery for bringing weak convergence 
results in the F domain over to the Q = F~ l domain, via compact differ- 
entiability of the inverse functional transform. Interestingly, one may now 
borrow their techniques to go the other way, starting with Proposition 8.1 
and the C n process of (19). The result is another Bernshtem-von Mises the- 
orem, stating that the posterior distribution of \/n(F — F n ), where F is the 
random distribution function stemming from a quantile pyramid Q, must 
tend to the Brownian bridge W°(Fq(-)), under mild conditions. 

9.2. Semiparametric models and quantile regression. In Section 2 we 
briefly pointed to quantile processes of the type Q(y) =(jl + a&~ 1 (Q U m(y)), 
which for given a) describes a prior situated at the N(/i,cr 2 ) quantile 
function. By in addition having a prior on (/i, a) one has a semiparametric 
Bayesian construction for handling an uncertain distribution about the nor- 
mal. The posterior distribution can be established via a Metropolis-Hastings 
algorithm based around the likelihood function at level m given by 



where now q U m,j = Quni(i/2 m ) and Nj(q un i) is the number of observations 
in 



Similarly one may work with quantile regression problems, of the type 
Qi(y) = a + bxi + crQ~ 1 (Q un i(y)), with a prior on (a,b,a) independent of 
the Q nn i pyramid. This would be a semiparametric construction more gen- 
eral in spirit than that of Kottas and Gelfand (2001), who work with the 
Dirichlet process. 

9.3. Dependent quantile pyramids. There are various statistically impor- 
tant problems associated with dependent quantile functions, for example, in 
finance. This might in the present context call for constructing dependent 
quantile pyramids, for which there are several possibilities. A particular ver- 
sion is as follows, elaborating on the idea that the 7 m j's for two pyramids 
can be made dependent: 



V m ,j = G 1 (<f>(N m!j );±a m ,±a m ), V m ,j = G l {<S>{N' md );\a m ,\a m ), 




(H + a$ \q, 
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where N m j and N' m ■ are standard normals with correlation p m , say, and 
with G being the cumulative distribution function for the Beta distribution. 
This leads to two dependent Beta quantile pyramids Q and Q' . More gen- 
erally time series of quantile pyramids can be worked with through suitable 
time series models for the underlying V^-'s. 

9.4. Asymptotics for the linear interpolation model. Proposition 8.1 and 
the anticipated process generalization described in the remark following it 
relate to the multinomial substitute likelihood L n of (11). Results of a some- 
what different nature can be reached with the linear interpolation model 
based likelihood L n (q) of (10), and these modified statements need different 
proofs. Importantly, for a fixed fine-ness level m, the two quantile likeli- 
hoods (10) and (11) are concerned with two different versions of quantiles; 
the first is maximized by estimators that tend to the least false quantiles 
q° = (q®, . . . , q®) that minimize the distance function X(q) of Section 5.1, 
whereas the second is maximized by estimators that tend to real underlying 
quantiles, as explained in Section 5.2. The difference between the pseudo- 
quantiles and real quantiles goes to zero as the level m increases, however, as 
is implicit in Proposition 7.1. There are, however, "cube root asymptotics" 
that govern the large-sample behavior of distributions associated with the 
linear interpolation likelihood; see Hjort (2007). 

9.5. More general quantile processes. Our Ii m {q) priors for quantiles Q m 
have been constructed in a natural pyramidal fashion, and we saw in Sec- 
tions 5 and 6 that the natural updating mechanisms involved likelihoods 
that factorized in precisely this way. More general constructions can also be 
worked with, however, using techniques and results from our article. One 
may work with Q(l/k), . . . , Q((k — l)/k) for k different from the pyramid's 
2 m , with methods of Section 6 still applying, and other constructions for 
building suitable H m (qi, . . . , qk-i) may be contemplated, as, for example, 
setting qj equal to F(j/k) for a random distribution function F on the unit 
interval. Our pyramids correspond to special cases of such constructions. 

9.6. Further quantilian quantities. In our article we have developed and 
discussed nonparametric Bayesian tools for analyzing a quantile distribu- 
tion Q. This is a fundamental statistical quantity, and other quantities of 
importance depend naturally on Q. Among these are the Lorenz curve 



and the Gini index G = 2/ {l — L(y)}dy. Since we are able to obtain 
posterior samples of the full Q curve, with a quantile pyramid as prior, 




for < y < 1 
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such may be used to carry out inference for, for example, the Gini in- 
dex. There are also important procedures for comparing two populations in 
terms of their quantile functions, including Doksum's (1974) shift function 
D(x) = F 2 _1 (Ji(z)) - x and Parzen's (1979) comparison distribution ir(y) = 
F2(F{' 1 (y)). Here quantile pyramids may be used as priors for Q\ = F^ 1 and 
Q2 = F 2 ~ X , and Bayes analysis via posterior samples of the D(x) and 7r(y) 
curves may be performed via our methods. Hjort and Petrone (2006) give 
a detailed analysis of these matters for the special case of Dirichlet process 
priors. 
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